Metric Techniques for High-Dimensional Indexing

نویسنده

  • Christian Digout
چکیده

Despite the proposal of numerous tree-based structures for high-dimensional similarity searches, techniques based on a sequential scan, such as the VA-File, have been shown to be quite effective. In this thesis we present three new access structures which use sequential access patterns to ef£ciently answer similarity queries for high-dimensional vector and metric data. Two of these access structures are designed to answer range queries, while the third access method is intended for nearest neighbor queries. The three access methods organize preprocessed data and reorder the original data sequentially on disk. At query time, portions of the data are read sequentially to prevent expensive random disk I/Os that prevent many access methods for high-dimensional data and metric data from being ef£cient. Experimental results show the £rst two proposed access structures can process range queries up to 24 times faster than the VA-File and up to 69 times faster than a sequential scan of the data set. The third proposed method is designed for nearest neighbor queries and is up to 15 times faster than the VA-File method and more than 40 times faster than a sequential scan of the data set. Unlike the VA-File, the access structures presented in this thesis work in general metric spaces, in addition to vector space (under a metric distance), scale well with increasing the radius of range queries and the number of nearest neighbors retrieved for nearest neighbor queries, and can be easily implemented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Metric-Based Shape Retrieval in Large Databases

This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...

متن کامل

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

Kernel-Based Cardinality Estimation on Metric Data

The efficient management of metric data is extremely important in many challenging applications as they occur e.g. in the life sciences. Here, data typically cannot be represented in a vector space. Instead, a distance function only allows comparing individual elements with each other to support distance queries. As high-dimensional data suffers strongly from the curse of dimensionality, distan...

متن کامل

On the Surprising Behavior of Distance Metrics in High Dimensional Space

In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a efficiency and/or effectiveness perspective. Recent research results show that in high dimensional space, ...

متن کامل

Similarity Search on Bregman Divergence: Towards Non-Metric Indexing

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition ...

متن کامل

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

In recent years, the eeect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a eeciency and/or eeectiveness perspective. Recent research results show that in high dimensional space, the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004